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10.  abstract  cConl/mio  on  tarataa  alda  II  naeaaaarr  and  Idanllly  hr  Mock  numbar) 

The  Applied  Science  Laboratory  under  the  Directorate  of  mtrology  has  a 
service  function  to  provide  laser  power  and  energy  measurem^  centrlfl- 
cat»ons  for  all  Air  Force  Agencies  In  need  of  such  service,  "nhls  report 
describes  how  the  data  collected  In  a  laser  power  meter  exchange  program 
between  the  Directorate  of  Metrology  and  Brook  AFB,  Texas  Is  analyzed  In 
determining  the  errors  In  the  Brook's^ Intercomparison  measurement  process, 


INTRODUCTION 


The  Applied  S<.  ieme  Laboratory,  ACMC,  has  a  requirement  to  establish 
a  CW  Laser  Power  Meter  Measurement  Assurance  Program  between  this  labor¬ 
atory  and  several  Air  Force  Research  and  Development  Laboratories. 

This  report  describes  and  proposes  a  single  and  easy  to  apply 
criterion  which  allows  one  to  objectively  characterize  and  quantitatively 
evaluate  a  measurement  assurance  program  of  the  type  required  to  support 
the  Air  Force  R&D  Laboratories. 

The  criterion  described  is  based  upon  a  linear  regression  analysis 
of  the  output  volts  (X)  from  the  standard  power  meter  compared  to  the 
output  volts  (Y)  of  another  power  meter  called  the  transfer  standard. 

Each  comparison  between  X  and  Y  is  treated  as  a  coordinate  pair  (X,Y). 

The  analysis  method  used  in  this  report  follows  that  described  in 
NBS  Handbooks  91  and  300,  Chapter  5-A,  "Problems  and  Procedures  for 
Functional  Relationships".  The  chapter  and  title  in  both  books  are 
the  same. 
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SUMMARY 


Using  chti  voltage  measurements  from  the  standard  and  the  transfer 
standard  as  X  and  Y  respectively  and  treating  X  and  Y  as  linearly 


related  the  following  quantities  were  determined. 


Mean  X 


S t d  Dev  X 


Median  X 


Mean  Y 


Std  Dev  Y 


Median  Y 


Intercept 
Slope  b 


Variance  in  Y 


Variance  in  slope  b^ 
Variance  in  intercept  b 


Correlation  coefficient  r 


Error  In  the  slope  b^ 

Error  in  the  Intercept  b 

o 

Whole  line  95%  confidence  Interval 
Whole  line  95%  confidence  error  in  Wj 
On  line  95%  confidence  interval  W- 


On  line  95%  confidence  error  In  W„ 


Single  future  95%  confidence  Interval  W., 
Single  future  95%  confidence  error  in  U. 


Correlation  coefficient  r 


The  resulting  equation  was  determined  to  be: 
Y  -  -0.6A587+1.13392X 


4.4756 


.1073 


4.429 


-0.64587 


1.13392 


.0005186 


.0023691 


.047480 


+2.29% 


+.0382 


+.93% 


+.0301 


+.73% 


+  .0565 


+1.37% 


DTSCIISSTON 


In  this  intercomparison  process  a  laser  power  meter  transfer 
standard  was  compared  to  a  laboratory  standard  by  alternately  placing 
each  into  the  laser  beam.  The  voltage  readings  were  taken  digitally 
at  the  end  of  a  100  second  exposure  period.  See  Time  Phase  Diagram, 
Fig.  1. 

Twenty  voltage  intercomparison  measurements  were  made  comparing 
the  transfer  standard  voltage  Y  to  the  standard  voltage  X.  In  this 
analysis  the  X  values  were  treated  as  Independent  variables  and  the 

Y  values  were  treated  as  dependent  variables.  The  measured  values  of 
X  and  Y  were  treated  as  20  pairs  of  independent  measurements,  since 
the  X  values  were  measured  on  a  different  Instrument  from  that  of  the 

Y  values.  Each  comparison  between  X  and  Y  is  treated  as  a  coordinate 
pair  (X,Y). 

A  linear  relationship  is  assumed  between  the  two  variables  X  and  Y 

Y  «  b  +b,X  (1) 

o  1 

The  method  of  least  square  regression  analysis  is  used  for  deter¬ 
mining  the  linear  relationship  between  the  two  variables  X  and  Y 
(regression  line  of  Y  on  X). 

A  rough  plot  of  the  values  as  coordinate  pairs  (X,Y)  showed  that 
they  approximate  a  straight  line. 

For  those  who  have  access  to  computers  or  the  hand  calculators, 
the  operations  for  determining  the  coefficients  b^  and  b^  In 
equation  (1)  above  can  be  made  quite  simple. 
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stmsiesi; 


BASIC  worksheet  FOR  ALL  TYPES  OF  LINEAR  RELATIONSHIPS 


X  denotes  Voltage .  standard 

Y 

denotes  Voltage,  transfer  standard 

I’X  =  _ 

89.5100 

ty 

>  88.5800 

X  = 

A.A755 

Y 

•  4.4290 

Number  of 

points:  n  •  20 

Step  (1)  >:XY 

58 

396.6880 

(2)  (i:X)(2;Y)/n  = 

400.60201 

(3)  S 

xy 

i:x2  = _ 

B 

.24821 

(A) 

A00.8209 

(7)  EY2  »  _ 

392.6116 

(5) 

(i;x)2/n  = _ 

AOO. 60201 

(8)  (LY)^/n  -  _ 

392.32082 

(6) 

s 

.21890 

(9)  S  “  _ 

,29078 

XX  - 

yy 

(10) 

u  S 

•'l  *  '  _ 

1.13392 

(14)  (S  )2  - _ 

.28144 

s 

XX 

s 

XX 

(11) 

Y  -  _ 

A. 429 

(15)  (n  -  2)82  . 

.00934 

(12) 

b,X  = _ 

5.07A87 

y 

(16)  s2  .  _ 

.0005186 

y 

(13) 

b  -  Y  -  b,X  - 

-.64587 

s  « 

.022773 

y 

Equation  of  the  line: 

Estimated  variance  of 

the  slope: 

Y  « 

b  +  b,X 
-?6A587-H.  13392X 

2 

®K  “  -X  *  _ 

.0023691 

*^1  S 

XX 

-  .0A8674 

Estimated  variance  of 

Intercept : 

-  .21790 

82  “82  1  4-  x2  • 

.047480 

D 

O 

‘‘o  y  n  T  ~ 

XX 

Note: 

The  following  are  algebraically  Identical: 

S 

XX 

-  E(X  -  X)2;  s  - 

yy 

KY  -  Y)2; 

S  -  KX  -  X)(y  -  Y) . 
xy 

Ordinarily,  in  hand  compucaclon,  iC  la  preferable  to  cooipute  aft  shown  in 
the  acepa  above.  Carry  <11  4aclnftl  placoa  obra1nabl«>^ .s. ,  If  data  fra 
recorded  to  two  decimal  places,  carry  four  places  In  Steps  (1)  through  (9) 
In  order  to  avoid  losing  significant  fleures  in  subtraction. 

"Copied  from  NBS  Handbook  91,  pg  5-10,  U.S.  Oovt  Printing  Office, 
Washington,  D.C." 
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Tt>e  pericMU  i-rror  1  in  the  slope  at  the  95%  confidence  Interval 

is  determined.  The  student  "t"  value  for  95%  confidence  and  18 

degrees  of  freedom  t(.95,18)=‘  2.101.  See  a  copy  of  the  table  In 

appendix  C  "I'en  ent  i  les  of  the  "t"  Distribution". 

%  =  l(.95,18)  S  X  100 

n  _ 

X 

=  2. 101(.048674)X100 

20 

4.4755 


%  =  f.51% 

Likewise  the  percent  error  %  at  the  95%  confidence  Interval  for 
the  intercept 

%  =  t(.95,18)  X  100 
o 

n 

X 

%  =  2.101(.2l7e)  X  100 

_ 20 _ 

4.4755 


%  =  +2.29 
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F.st  iniate  the  95%  confidence  Interval  for  the  whole  line. 

See  Fi^nro  2. 

Wj  =  J17  Sy  ^  1  +  (X-X)^j  4 

F  the  percentile  of  distribution  is  taken  from  Table  A-5  in  the 
back  of  NHS  Handbook  91.  ^95  (2,18)  =  1.55.  See  Appendix  B. 

Determine  for  several  values  X, (A . 2<X<4 . 8) . 


X 

±w, 

Y 

+% 

4.2 

.0382 

4.12 

.93 

4.25 

.0322 

4.17 

4.30 

.0265 

4.23 

.63 

4.35 

.0212 

4.29 

4.40 

.0167 

4.34 

.38 

4.45 

.0140 

4.40 

4.50 

.0139 

4.46 

.31 

4.55 

.0167 

4.51 

4.60 

.0211 

4.57 

.46 

4.65 

.0264 

4.63 

4.70 

.0321 

4.68 

.69 

4.75 

.0381 

4.74 

4.80 

.0442 

4.80 

.92 

The  Interval  +.0382  or  +.935;  is  the  widest  Interval  falling 
within  A.2<X<A.8,  the  whole  line  range.  Consequently  this  is  the 
interval  for  which  we  are  955;  certain  that  all  values  Y  will  fall  for 

~8- 


the  whole  line. 


Estimate  W,  the  95X  confidence  Interval  for  a  single  point  on  the 
line  (i.e.  the  mean  values  of  Y  corresponding  to  chosen  values  of  X) . 
See  Figure  2. 

=  .05 

1-  ^  =  .975 
2 

The  t  (the  percentile  of  distribution)  Is  taken  from  Table  A-4 
in  the  NBS  Handbook  91,  t  for  18  degrees  of  freedom  Is  2.101. 

See  Appendix  C. 


Determine 

for  valuei 

X,  (4.2<X<4 

.8) . 

X 

+% 

4.20 

.0301 

4.12 

.73 

4.25 

.0254 

4.30 

.0209 

4.23 

.49 

4.35 

.0167 

4.40 

.0132 

4.34 

.30 

4.45 

.0110 

4.50 

.0110 

4.46 

.25 

4.55 

.0131 

4.60 

.0166 

4.57 

.36 

4.65 

.0208 

4.70 

.0253 

4.68 

.54 

4.75 

.0300 

4.80 

.0349 

4.8 

.73 

(X-X)^  1  is 

^XX  J 
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Consc‘(|iient  I  y  wo  may  btJie  fur  a  given  value  of  X,  Chat  the  mean  value 


Y  will  fall  within  + 

.0349  or  +.73%  95%  of 

the  time. 

Estimate  the  957 

I'onfidence  interval  W 

3 

a  single  future  value 

of  Y  for  a  singli-  chosen  value 

of  X.  See 

Figure  2. 

- 

1  -^/2  ^y 

1  +JL+  (X 
n  S 

-X)^ 

XX  _ 

h 

The  future  valuv.' 

s  are  taken  to  mean  if 

the 

same  measurement  pro- 

i-ess  is  repeated  in 

the  future 

these  future  values  of  Y  for  chosen 

values  of  X  are  expe 

I'ted  to  fall  within  +W 

3* 

^975 

=  2.101 

iJetermine  several  values  of  for  chosen  values  of  X, (4 . 2<X<4 . 8) . 

X 

+W,^ 

Y 

+  % 

4.2 

.0565 

4.12 

1.37 

4.25 

.0542 

4.17 

1.30 

4.30 

.0522 

4.23 

1.23 

4.35 

.0507 

4.29 

1.18 

4.40 

.0496 

4.34 

1.14 

4.45 

.0491 

4.40 

1.12 

4.50 

.0491 

4 . 46 

1.10 

4.55 

.0496 

4.51 

1.10 

4.60 

.0507 

4.57 

1.11 

4.65 

.0522 

4.63 

1.13 

4.70 

.0541 

4.68 

1.16 

4.75 

.0565 

4.74 

1.19 

4.80 

.0592 

4.80 

1.23 
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rhe  interval  +.0565  or  +• • 37%  is  the  widest  Interval  falling 
within  4.2<X<4.8  the  whole  line  range.  Consequently  future  deter¬ 
minations  of  Y  values  using  the  least  square  fitting  method  for  the 
same  measurement  process  are  expected  to  fall  within  this  interval 
95%  of  the  time. 

The  degree  of  correlation  between  the  variables  X  and  Y  is 
measured  by  the  sample  correlation  coefficient  r. 

Determine  the  correlation  coefficient  r  for  the  sample, 
r  =  S 

s  s 

XX  yy 

=■  .24821 

(.21890)  (.29078) 

=  .9838 
r  -  .98 

This  method  of  analysis  shows  that  the  mean  voltage  of  the  trans¬ 
fer  standard  Y  can  be  computed  as  a  function  of  a  given  standard 
voltage  X  by  the  following  equation: 

Y  -  -0.64587  +  1.13392X 

This  equation  does  not  produce  a  single  value  of  Y  for  a  single 
value  of  X.  It  produces  Y  the  mean  of  n  Y  values.  One  would  not 
expect  to  make  a  single  measurement  getting  the  computed  value  Y;  one 
must  make  n  measurements  i.e.  5  measurements  the  mean  of  which  would 
fall  on  the  Y  line.  See  Figure  2.  To  check  this  statement,  go  to 
Y  values  table  pgti2  and  select  5  consecutive  Y  values  i.e.  4.32, 
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irwi^n 


CALCULATED  VALUES  OF  Y  IN  TERMS  OF  X 
y  -  -0.64587  +  1.13392X 


4.33,  4.34,  4.35  and  4.37;  the  mean  of  this  group  is  4.34.  The 
X  value  opposite  4.34  is  4.40.  Consequently  if  one  maintained  the 
standard  voltage  X  at  4.40  and  made  a  single  measurement  it  is  equally 
likely  than  any  one  of  the  five  Y  value  would  occur. 

To  get  a  clear  understanding  of  the  criteria  used  in  evaluation 
this  intercomparison  process  it  is  important  to  correctly  interpret 
the  meaning  of  the  values  Wj,  and  W^.  There  are  errors  inherent 
in  any  measurement  process;  consequently  in  this  measurement,  the 
process  procedures,  the  operator's  skill,  the  environmental  conditions 
and  the  equipment  used  attribute  to  inherent  errors  which  cause  variation 
in  the  recorded  voltages  of  the  standard  X  and  the  transfer  standard  Y. 
These  variations  are  expressed  as  +Wj ,  +W2  and  +Wj,  and  they  fluc^ate 
in  the  Y  direction  above  and  below  the  Y  line.  From  the  computations 
and  Figure  2,  it  is  noted  that  the  value  is  chosen,  such  that  it 
envelops  the  worst  deviation  which  occurs  at  X  >  4.2  and  X  >  4.8. 
Consequently  the  process  variation  is  3-. 0382  volts  above  and  below 

the  Y  line  or  +.93%  above  and  below  the  Y  line.  It  is  noted  from  the 
computation  also  that  decreases  as  values  of  X  are  held  near  the 
mean  X  “  4.476  which  itself  is  limited  by  the  inability  of  the  laser 
to  remain  fixed  at  one  power  setting.  defines  the  variation  in 
the  Y  line  itself.  Consequently  for  any  Y  value  located  directly  on 
the  Y  line  itself,  its  worst  variation  W2  ■  +.0301  or  +.73%  which 
occurs  also  at  X  >  4.2  and  X  >  4.8.  Its  minimum  deviation  also  occurs 
around  X  -  4.476  similar  to  W^. 
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defines  rhe  interval  inside  which  future  and  subsequent  measure¬ 
ment  variations  in  Y  are  expected  to  fall.  Consequently,  for  future 
and  subsequent  application  of  the  same  intercomparison  process  using 
the  lease  squares  method,  the  predicted  variations  in  are:  +.0563 
or  -♦■1.37%  above  and  below  the  Y  line. 

It  is  noted  that  all  values  of  W^,  W2  and  are  computed  at  the 
95%  confidence  interval.  The  use  of  the  95%  or  99%  confidence  interval 
is  arbitrary,  however,  since  the  95%  confidence  interval  is  more  fre¬ 
quently  used,  we  have  chosen  to  use  it  in  our  computations.  By  arbi¬ 
trarily  chousing  the  95%  confidence  interval  for  our  computations, 
this  means  that  we  are  95%  confident  that  this  measurement  process 
will  yield  W  values  which  will  fall  within  the  computed  W^,  and 
Intervals. 

This  also  means  that  we  have  chosen  to  live  with  the  risk  that 
5%  of  the  W  values  are  expected  to  fall  outside  the  computed  intervals. 

Twenty  comparison  measurements  were  made  during  this  test  mainly 
because  this  was  the  first  time  this  process  was  being  evaluated. 

These  data  were  collected  over  a  period  of  two  days.  Now  that  the 
initial  evaluation  has  been  completed,  future  evaluations  will  be 
done  by  computing  the  W  intervals  based  upon  10  data  pairs  X,  Y. 

Invariably  after  the  data  has  been  recorded,  one  Is  faced  with 
the  problem  of  what  to  do  with  one  or  two  suspected  values  which  will 
tend  to  shift  the  average  away  from  that  which  is  desired  by  the 
experimenter . 


Now,  in  order  to  have  a  meaningful  measurement  program  with 
i-redlbi lity ,  one  must  adopt  and  accept  some  degree  of  objectivity 


and  uniformity  when  it  comes  to  the  rejection  of  suspected  data  values. 

The  metliod  suggested  here  is  a  simple  one,  so  chosen  to  encourage 
its  use,  rather  than  a  more  complex  method  which  would  discourage 
its  use. 

The  method  applied  here  in  testing  whether  to  accept  or  reject 
a  suspected  value  is  described  in  NBS  Handbook  300  Volume  1,  p.  349-520 
titled  "Rejection  of  Outlying  Observations".  See  Instructions  from 
this  article  Appendix  D. 

First  list  the  X  and  V  values  in  the  order  of  size. 


X 

Y 

■ 

4.31 

4.21 

4.36 

4.28 

4.36 

4.28 

4.37 

4.33 

4.39 

4.34 

4.42 

4.36 

4.42 

4.37 

4.43 

4.38 

4.43 

4.39 

4.45 

4.41 

median 

4.45 

4.41 

4.46 

4.42 
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r 


X 

Y 

4.48 

4.45 

4.  SO 

4.46 

4.54 

4.50 

4.56 

4.51 

4.57 

4.54 

4 . 66 

4.64 

4.67 

4.64 

4.68 

4.66 

Since  we  have  20  values  the  appropriate  ratio  is: 

r22  n  =  14  to  30 

X  _-X, 
n-2  1 

Keferring  to  the  above  list  ot  X  values,  Xj  «  4.31,  X2  “  4.36 
and  X  =  4.6f>. 

1  O 

r„.,  *  4 . 36-4 . 31 
4.66-47 31 

r22  =  .lA 

Referring  to  Table  I  Appendix  D,  Testing  for  Extreme  Observations, 
under  the  oolumn  marked  =  5  which  has  the  95X  confidence  interval 
critical  values,  it  is  noted  that  at  n  •  20  the  critical  value  Is 
.45.  Since  the  above  computed  value  122  ”  .14  is  less  than  the 
critical  value  .45  the  suspected  value  of  4.31  is  not  a  mistake  and 
may  not  be  excluded.  In  like  manner  the  decision  is  the  same  when 
the  same  procedure  is  applied  4.21  as  the  suspected  value  in  the 


Y  column. 


If  there  Is  a  need  to  use  this  procedure  to  exclude  more  than 
one  value  In  ten,  one  would  be  cautioned  to  stop  and  to  investigate 
the  measurement  process  for  unsuspected  disturbances  which  may  be 
producing  the  outlying  values.  Of  course,  one  always  has  the  option 
to  run  the  test  again. 
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CONCLUSIONS 


Tlie  method  of  analysis  presented  In  this  report  affords  a  simple) 
and  easy  to  apply,  criterion  for  objectively  characterizing  the 
quality  of  an  intercompar ison  type  measurement  process. 

The  method  described  allows  the  measurement  process  to  be  ob¬ 
jectively  evaluated  by  computing  the  process  and  Intervals. 

As  the  quality  oi  the  process  improves,  the  W  intervals  will 
decrease.  If  i.e.  in  subsequent  tests  the  W  Intervals  increase, 
this  would  indicate  a  decrease  in  the  quality  of  the  process.  For 
example  variations  In  procedures,  variations  in  operator  character¬ 
istics,  variations  In  equipment  functional  characteristic  and 
variations  in  the  laboratory  environmental  conditions  could  be  ob¬ 
jectively  evaluated  by  monitoring  the  magnitude  of  the  variations  in 
the  process  W  Intervals.  See  Figure  3. 

One  is  cautioned  that,  to  indiscriminately  throw  away  one  or 
two  data  points,  introduces  a  subjective  operator's  bias  into  the 
measurement  process  which  in-turn  reduces  the  credibility  and  Integrity 
of  the  process.  Consequently,  if  there  is  a  need  to  throw  away  one  or 
two  data  points,  all  participating  operators  are  advised  to  always  use 
an  objective  data  rejection  method  of  the  type  described  in  this  report. 

This  method  of  evaluation  applies  equally  to: 

(1)  intercomparison  tests  in  the  same  laboratory  by  comparing 
W  Intervals  of  previous  tests  to  W  interval  of  current  tests  and 
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(2)  interc-oniparison  tests  in  different  laboratories  by  com¬ 
paring  the  W  intervals  computed  by  one  laboratory  to  the  W  intervals 
computed  by  the  other  laboratory  as  long  as  both  laboratories  have 
measured  the  same  transfer  standard  and  have  agreed  to  compare  at  a 
specified  confidence  interval  i.e.  95Z. 
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are  known  lo  tiav*'  a  lintiied  range  of  values  of  X  problems  and  prircetlures  with  nuroei  ical  exam* 

(which  is  an  >11  Relaiionshi|>)  pies  are  given  for  SI  relaiionships  in  Paragraph 

Table  r>- 1  l  ives  a  brief  summary  c  haraeterua-  5-5.1  and  for  SlI  relutionxhips  in  Paragraph 

tion  of  SI  .ind  SlI  Relaiiunships.  Detailed  5-5.2. 


BASIC  WORKSHBCT  FOR  ALL  TYKS  OF  lINf All  ftiLATKNtfHm 
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.  .  (8)  (Sy)*/n 

Step  (4)  -  Step  (5)  (9)  = 


Step  (7)  -  Step  (8) 


(10)  6,  -  ^'‘0  =  Su-p  (:l)  -  Step  (6)  (14) 

*5gj 


(11)  f  -  - . -  - . 

(12)  6,X  =  -  - 

(13)  bo  *  r  -  6,.V'  -  Step  (II)  -  Su*p  (12) 
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(16)  sV  “  Step  (IS)  (n  —  2) 


Estimated  variance  of  the  slope: 


S.. 


^tqp  (1^  +  Step  (6) 


Estimated  variance  of  intercept: 


Nolt:  Thi-  following  are  algebraically  identical: 

S..  -  1(X  -  X)«;  -*  liV'  -  ?)*;  S.,  »  ZiX  -  X,  {.Y  -  Y). 

Ordinarily,  m  hand  computation,  it  is  preferable  to  compute  as  shown  in  the  steps  above.  Carry 
all  decimal  places  obtainable -i.e.,  if  data  are  recorded  to  two  decimal  placea,  carry  four  plaesa  in 
Steps  (1)  through  (9)  in  order  to  avoid  losing  significant  Aguies  in  subtrMtion.  from 

MBS  Handbook  9l  pg  5-10,  U.S.  Govt  Printing  Office,  Washington,  D.C. 
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Rejection  of  Outljinc  Obeermtione 
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Jftumml  Bmrmu  iif  Sltm4ar4$.  Wm$kim0m.  t>.  C. 
(Raoeivied  NovMbv  24. 1M2) 


‘r******^.  “•  ‘*0  of  tfct  morfan  HmtMai  MM.  for  poMibl. 

rejection  of  outlying  obwrvmtioiM.  TIMm  two  nwtlwd.  haw  Um  MiKMd  b«auM  theyl^y 
ui  •  majority  of  the  Ktuatly  occucrii^  aituatioM  uid  b«auw  tbey  an  w  any  to  hm. 


/V  PERENNIAL  problem  vexing  the  experi¬ 
menter  is  that  of  rejection  of  suspected 
data.  For  one  hundred  years  attempts  at  the 
solution  of  this  problem  have  been  advanced, 
most  of  them  to  be  themselves  rejected  as  suspect. 
Fortunately,  modern  statistical  theory  has 
proposed  useful,  reliable  methods  for  objectively 
rejecting  deviant  values.  However,  the  solution 
is  far  from  complete  at  present. 

This  pa|>er  makes  available  to  the  physicist 
two  of  the  modern  statistical  tests  for  possible 
•■ciection  of  outlying  observations.  These  two 
methods  have  been  selected  because  they  apply 
in  a  majority  of  the  actually  occurring  situations 
and  because  they  are  so  easy  to  use. 

THS  PROBLBM 

Here  is  a  common  problem  facing  experi¬ 
menters.  The  typical  scientist,  X.  Perry  Mcnter, 
makes  a  number  (say  five)  of  repeated  measure¬ 
ments  of  some  unknown  quantity.  The  smallest 
value  (or  the  largest)  is  so  far  removed  from  the 
other  four  that  he  suspects  that  it  may  be  in 
error.  However ,  I*erry  has  no  specific  knowledge 
fbat  a  mistake  actually  did  occur.  Let  us  assume 
that  he  has  no  previous  daU  from  which  to 
estimate  the  precision  of  measurement.  How  can 
he  decide  from  the  values  themselves  whether  the 
suspected  value  is  in  error  or  not  ? 

The  answer  seems  clear.  He  should  consider 
the  suspected  value  as  in  error  when  it  seems 
too  far  from  the  other  four  values.  But  how  can 
he  judge  when  it  is  "too  far  from  the  other  four 
values"? 

A  UXilCAl  AmiOACB 

M***  *  *  simple,  logical,  objective  criterion. 
^PP*>**  Perry  could  somehow  make  millions  of 
Sylvsnia  KIscok  Pruducti,  Inc..  Hicksvill*. 
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seta  of  five  observations  each.  Suppose,  too. 
that  he  could  guarantee  that  none  of  these  ob¬ 
servations  had  any  mistakes.  Call  a  typical  set 
xp  *t,  *$,  Xi,  xt,  where  the/a's  are  arranged  in 
order  of  size,  so  that  Now  a 

logical  measure  of  the  distance  between  the 
smallest  value  and  the  other  four  values  is 

*»-*! 

ri.*- - , 

*$-*! 

i.e.,  the  ratio  of  the  interval  between  the  sus¬ 
pected  and  adjacent  value  to  the  total  range. 

Now  Perry  records  with  what  frequency, 
among  his  millions  of  sets  of  five  values  each, 
different  values  of  ri»  occur.  He  finds  that  a  value 
of  f|»  larger  than  0.780  occurs  one  time  in  one 
hundred.  He  then  reasons  this  way: 

I  have  found  that  among  sets  of  five  observa¬ 
tions  each  (comiaiming  ne  mistahes)  a  value  of 
y  I*  larger  than  0.780  is  quite  unlikely  (occurs  only 
once  in  one  hundred).  If  now,  in  my  future  ex¬ 
periments  I  get  a  set  of  five  observations  for 
which  r„  is  larger  than  0.780,  I  will  conclude 
that  my  largest  observation  is  in  error." 

coimosircB  m  thb  tbst 

This  seems  reasonable.  But  what  confidence 
can  Perry  have  in  such  a  procedure?  How  often 
will  he  consider  as  mistaken  a  perfectly  good 
observation?  How  often  will  he  consider  ac¬ 
ceptable  an  incorrect  observation  ? 

Clearly,  from  the  way  in  which  he  derived  the 
test,  he  will  classify  a  perfectly  good  smallest 
observation  as  mistaken  once  among  one  hundred 
sets  of  five  each,  on  the  average.  But  there  is  no 
general  answer  to  the  question  of  how  often  he 
will  let  pass  a  misuken  observation.  This  de¬ 
pends  on  how  “mistaken"  the  mistaken  observa¬ 
tion  is.  If  a  very  forge  error  were  made,  his 
pg  349-520, U.S.  Govt  Prlntlnj*  01  t  ire. 


test  would  tend  to  reject  the  observation  almost 
certainly.  If  a  very  small  error  were  made,  his 
test  would  tend  to  reject  the  observation  with  a 
small  probability. 

Figure  1  gives  some  idea  of  the  performance  of 
fit  in  detecting  mistaken  observations.  It  is 
baaed  on  a  sampling  experiment  in  which  samples 
of  6ve  from  a  normal  population  with  mean  n 
and  standard  deviation  «  were  contaminated 
with  values  drawn  from  a  normal  population 
with  mean  (m+X«)  and  standard  deviation  0. 
The  ordinate  shows  the  percent  discovery  of 
contaminators  (the  proportion  of  the  time  the 
contaminating  population  provides  an  extreme 
value  and  the  test  discovers  this  value)  while 
the  abscissa  shows  X,  the  magnitude  of  the 
ihift  (error)  of  the  contaminator  in  standard 
deviations. 

We  said  above  that  once  in  every  100  sets  of 
values  (on  the  average)  Perry  would  consider  as 
mistaken  a  perfectly  good  observation.  If  he  were 
to  reject  this  observation  and  then  compute 
the  mean  and  standard  deviation  of  the  re* 
maining  values,  these  would  be  biased  estimates. 
In  addition,  when  a  good  observation  is  rejected, 
any  further  statistical  teats  of  significance  will 
become  less  reliable.  This  is  the  price  that  he 
must  pay  for  improving  the  data  in  the  cases 
where  a  mistaken  observation  is  removed. 


% 


Fic.  1.  Performsnee  of  r  lest.  Tbs  ordinals  shows  tha 
percent  disoovery  of  contsminalors,  while  tha  ahscisas 
shows  X,  the  magnitude  of  the  shift  (error)  of  tha  con* 
laminator  in  standard  deviations.  From  W.  1.  Diada, 
Ann.  Math.  Slat.  21,  No.  4,  493  (19M). 

sensitive  test;  Thus  for  sample  alae  11*8, 9,  or  10, 

*i-»i 

r„« - 

*•-1  -*i 

is  superior  to  fis.  Similarly  for  11, 12,  or  13, 
*s-*i 

r„- - 

*— » -*i 

is  superior.  Finally  for  nw  14,  IS,  • « 30. 


MATBBMATICAI  DOIVATIOM 

Of  course,  0.780,  the  value  of  fu  that  is  ex¬ 
ceeded  by  chance  1  percent  of  the  time  (called 
the  1  percent  level  of  significance  of  f|«),  is  not 
determined  by  actually  making  millions  of  sets  of 
five  observations  each.  Rather  it  may  be  calcu¬ 
lated  mathematically'  with  even  greater  accuracy 
than  if  millions  of  sets  of  five  observations  had 
been  used.  The  basic  assumption  is  that  the 
repeated  measurement*  would  follow  the  normal 
distribution. 


LAMOB  SAMPU  UZB 

For  sample  aiaes  larger  than  seven,  alight 
modifications  in  the  ru  atatistk  result  in  a  umt* 


•  Diasa,  Asa.  Math.  Scat.  32,  No.  1.  M-70  (ItSl). 


is  best. 


ru 


*•-»! 


»*-l  -*l 


DS8  OP  TABU  I 

Let  us  now  define  r  as  the  appropriate  autistic 
among  tm,  ru,  r«i.  and  rn  according  to  the 
sample  use.  Table  1  gives  critical  values  of  r  for 
significance  levels  *■■  5  percent  and  n*  1  percent, 
for  sample  sizes  from  l■w3  to  30. 

Thus  for  example  for  *>8  and  *->5  percent, 
the  table  gives  a  critical  value  for  r  (in  this  case 
rii)  of  0.SS4.  This  means  that  in  100  aeu  of  8 
observations  each,  free  of  mistakes,  five  values  of 
fii  will  be  larger  than  0.554,  on  the  average. 

What  if  PSrry  auapecU  the  accepUbility  of  the 
largest  observation  in  a  set?  In  this  caae,  he 
simply  considers  the  observations  as  numbered 
in  thf  mnrss  order  and  prooseds  as  before. 
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Table  1.  Tckling  ior  extreme  observation  (no  (Mst  dau).' 


Critical  valusi 

Suilailc  Siampic  tiH  a  a  Bsnaat  a»l  ptrcaat 


3 

0.941 

0.988 

4 

0.765 

0.889 

5 

0.M2 

0.780 

6 

0.S60 

0.698 

7 

0.S07 

0.637 

S 

V 

0.534 
.  0.312 

0.683 

0.633 

10 

0.477 

0.597 

11 

0.376 

0.679 

12 

0.346 

0.642 

*•-1  ^*1 

13 

0.521 

0.615 

X8-X| 

14 

0.346 

0.641 

1.S 

0.323 

0.616 

10 

0  307 

0.595 

17 

0.490 

0.577 

18 

0.473 

0.561 

19 

0.462 

0.347 

20 

0.430 

0.535 

21 

0.440 

0.524 

22 

0.430 

0.514 

23 

0.421 

0.503 

24 

0.413 

0.497 

23 

0.406 

0.489 

26 

0.399 

0.486 

27 

0.393 

0.473 

28 

0.387 

0.460 

29 

0.381 

0.463 

30 

0.376 

0.457 

•  Hy  pcimiaion  Irom  W.  J.  Dtido  and  F.  J.  Maatc*.  /nOadMaa 
IS  Sranilwai  Amalysu  (MtOrav-Hill  Baaa  Csaiaaay,  lac..  New  Vark. 
■  •51)  o.  JIB. 


Why  are  two  signifirance  levels  given?  The 
reason  is  that  no  one  significance  level  is  ap* 
propriate  to  all  problems.  For  example,  consider 
these  two  cases ; 

(a)  Additional  observations  are  not  possible. 

(b)  Additional  observations  are  possible. 

In  case  (a)  for  many  problems  it  might  be 
appropriate  to  compute  r  and  test  it  at  the  1 
percent  level  of  significance.  If  the  particular 
observed  value  of  r  is  larger  than  the  tabulated 
value  for  a<«l  percent,  it  might  then  be  a  good 
idea  to  exclude  that  observation. 

In  case  (b),  for  many  situations  a  reasonable 
procedure  might  be  to  test  r  at  the  5  pen-cut 
level.  If  the  sample  value  of  r  is  significant  at  the 
5  percent  level,  one  or  more  additional  oboerva- 
tions  would  be  taken.  If'  the  observation  orig* 
inally  suspected  remained  outlying,  it  would  be 
tested  again,  using  the  combined  set  of  observa¬ 


tions.  This  time,  however,  the  r  test  would  be 
performed  at  the  J  peretnt  level  of 
If  the  outlier  were  significantly  deviant  at  the 
1  percent  level,  it  would  be  rejected.  It  should 
be  noted  that  among  many  sets  tested  in  this 
way,  the  proportion  of  sets  in  which  a  perfectly 
good  largest  value  will  thus  be  rejected  will  be 
leas  than  1  percent.  This  is  because  the  observa¬ 
tion  has  a  "second  chance"  before  it  is  finally 
rejected. 

SDMMaAT 

A  set  of  n  observations  is  made.  No  previous 
data  are  available  from  which  to  estimate  the 
variability  of  a  measurement.  What  is  a  rational 
procedure  for  testing  whether  the  largest  (or 
smallest)  of  the  set  is  too  deviant  to  be  explained 
by  the  ordinary  errors  of  measurement? 

Rank  the  n  observations  in  order  of  sise  from 
smallest  to  largest,  if  the  smallest  observation  is 
suspected, 

xi^xt^  •  •  •  ^x, ; 

reverse  the  numbering  system  if  the  largest  is 
suspected. 


if  to  7 


if  ••8  to  10 


if  n*!!  to  13 


if  waU  to  30. 

Table  I  may  be  used  to  determine  how  likely 
it  is  to  get  as  laige  a  value  of  r  as  actually  ob¬ 
tained,  simply  by  chance.  A  procedure  that  might 

be  appropriate  for  many  problems  is  as  followa. 

(a)  Ns  addMsnal  st»srsstisni  psuOie.  In  this 
case,  compared  the  computed  r  with  the  value 
in  Table  I  at  the  1  percent  level.  If  the  com¬ 
puted  value  of  r  is  larger  than  the  ubulated 
value,  exclude  the  deviant  observation.  Other¬ 
wise.  do  not. 


Next  compute 

»f-X| 
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(b)  Additional  observations  possible.  In  this 
case,  compare  the  computed  r  with  the  value 
of  r  at  the  5  percent  level.  If  the  computed  value 
of  r  ia  larger  than  the  value,  take  one  or  more 
additional  observations.  Otherwise  accept  the 
suspected  value  without  taking  additional 
observations. 

If,  in  the  enlarged  set  (containing  all  the 
original  and  the  additional  observations),  the 
previously  suspected  value  remains  outlyinj 
compute  r  for  the  enlarged  set.  This  time  con 
pare  it  with  the  value  at  the  1  percent  leve 
If  the  computed  value  exceeds  the  table  valut 
exclude  the  outlier;  otherwise  do  not. 

IZAMPLBS 

1.  In  a  preliminary  experiment,  Silas  N.  Tisi 
makes  S  determinations  of  the  velocity  of  light 
in  vacuum  by  a  new  method,  obtaining  299  792, 
299  780,  299  795,  299  786,  299  820,  (km/sec).  Si 
N.  Tist  suspects  the  last  value,  299  820,  as  being 
mistaken  since  it  is  so  much  larger  than  the  other 
values.  Before  going  on  with  additional  experi¬ 
mentation.  Si  wishes  to  decide  whether  299  820 
is  misuken  or  not.  What  shall  he  do? 

Since  no  previous  data  are  available  from 
which  to  compute  the  precision  of  measurement 
by  this  new  method,  the  r  test  is  appropriate. 
The  first  step  is  to  arrange  the  five  values  in 
order  of  size:  299  780,  299  786,  299  792,  299  795, 
299  820.  Then 

299  820  -  299  795  25 
r-fis- - -  —  -0.625. 

299  820  -  299  780  40 

Since  this  is  less  than  0.780,  the  0.01  point  of 
T  for  11-5,  Si  N.  Tist  concludes  that  299  820  is 
not  mistaken. 

2.  Using  the  Atwood  machine,  Norris  G.  Neer 
makes  determinations  of  g,  the  acceleration  of 
gravity,  in  his  college  course  in  experimental 
physics.  N.  G.  Neer's  values  are:  986,  964,  989, 
1000,  987,  909,  999  (cm/sec*).  He  suspecU  909 
as  being  inconsistent  with  the  other  values. 
Shall  he  accept  it,  or  shall  he  experiment  further  ? 

He  computes 

x,-x,  964  -  909  55 

f-r„- - - - —  .0.604. 

Xf-x,  1000  -  909  91 


points  of  r  for  n-  7.  Hence  N.  C.  Neer  makes  an 
additional  determination  and  gets  a  new  value 
of  971. 

Since  909  remains  outlying  in  the  enlarged  set 
of  eight,  he  computes  r  for  this  set  of  eight.  Now 
r(rii)  is  0.61 1.  Since  it  is  smaller  than  the  1  per¬ 
cent  level  of  r  for  »-8,  N.  G.  Neer  accepts  909 
and  uses  all  eight  values. 
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This  value  lies  between  the  0.01  and  the  O.OS  Oi 
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